source("./Mean Reversion/RMR.001 Load Packages.R") pricing_data <- read_csv("./Mean Reversion/Raw Data/pricing data.csv") ## Parsed with column specification:
## cols(
## date_unix = col_integer(),
## date_time = col_datetime(format = ""),
## high = col_double(),
## low = col_double(),
## open = col_double(),
## close = col_double(),
## volume = col_double(),
## quote_volume = col_double(),
## weighted_average = col_double(),
## currency_pair = col_character(),
## period = col_integer()
## )
Description
Spreads Poloneix pricing data into wide format and filters data to a specified time resolution and time window.
Arguments
pricing_data: A dataframe containing pricing data from Poloneix gathered in tidy format.
time_resolution: The number of seconds that each observation spans. Takes values 300, 900, 1800, 7200, 14400, and 86400.
start_date: The start date of the time window.
end_date: The end date of the time window.
prepare_data <- function(pricing_data, time_resolution, start_date, end_date) {
df <- pricing_data %>%
filter(period == time_resolution,
date_time >= start_date,
date_time <= end_date) %>%
select(date_unix, date_time, close, currency_pair) %>%
spread(currency_pair, close)
return(df)
} Description
The Engle-Granger method is used to test for cointegration. This method is comprised of two steps: (1) Perform a linear regression of log(coin_y) on log(coin_x). (2) Perform an Augmented Dickey-Fuller test on the residuals from the linear regression estimated in (1). The ADF test specification is of a non-zero mean, no time-based trend, and one autoregressive lag. The function returns the ADF test statistic.
Arguments
coin_y: A vector containing the pricing data for the dependent coin in the regression.
coin_x: A vector containing the pricing data for the independent coin in the regression.
test_cointegration <- function(coin_y, coin_x) {
lm_model <- lm(log(coin_y) ~ log(coin_x))
lm_residuals <- lm_model[["residuals"]]
adf_test <- ur.df(lm_residuals, type = "drift", lags = 1)
df_stat = adf_test@testreg[["coefficients"]][2, 3]
return(df_stat)
} Description
Two sets of currency pairs are examined: currency pairs where USDT is the quote currency and currency pairs where BTC is the quote currency. All combinations of coins within each set are created. Combinations that consist of the coin with itself are removed. The function returns a dataframe containing the coin pairs.
create_pairs <- function() {
coins_usdt <- c("USDT_BTC", "USDT_DASH", "USDT_ETH", "USDT_LTC", "USDT_REP", "USDT_XMR", "USDT_ZEC")
coins_btc <- c("BTC_DASH", "BTC_ETH", "BTC_LTC", "BTC_REP", "BTC_XEM", "BTC_XMR", "BTC_ZEC")
coin_pairs <- rbind(expand.grid(coins_usdt, coins_usdt), expand.grid(coins_btc, coins_btc)) %>%
rename(coin_y = Var1,
coin_x = Var2) %>%
filter(coin_y != coin_x) %>%
mutate_if(is.factor, as.character) %>%
as_tibble()
return(coin_pairs)
} Description
Test for cointegration between each coin pair generated by the create_pairs() function. The test for cointegration is performed by the test_cointegration() function. The function returns a dataframe containing the coin pairs and the ADF test statistic resulting from testing cointegration between each coin pair.
Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
coin_pairs: A dataframe generated by create_pairs().
test_pairs <- function(train, coin_pairs) {
adf_stat <- c()
for (n in 1:nrow(coin_pairs)) {
coin_y <- coin_pairs[[n, "coin_y"]]
coin_x <- coin_pairs[[n, "coin_x"]]
cointegration_results <- test_cointegration(coin_y = train[[coin_y]], coin_x = train[[coin_x]])
adf_stat <- c(adf_stat, cointegration_results)
}
df <- coin_pairs %>%
mutate(adf_stat = adf_stat) %>%
arrange(adf_stat)
return(df)
} Description
Select cointegrated coin pairs to be used in a mean reversion strategy. The current coin selection logic is to select all coins where the ADF test statistic is less than -2.57.
Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
coin_pairs: A dataframe generated by create_pairs().
select_pairs <- function(train, coin_pairs) {
df <- test_pairs(train = train, coin_pairs = coin_pairs) %>%
filter(adf_stat <= -3.43)
return(df)
} Description
Generate trading signals that indicate the current position in the spread formed by a linear combination of coin y and coin x. A signal of +1 indicates a long position in the spread, 0 indicates a flat position, and -1 indicates a short position in the spread. Signals are generated for the test set using a model trained on the training set.
The current trading logic is perform a linear regression of log(coin y) on log(coin x) using the training set. A spread is then calculated in the test set using the fitted hedge ratio and intercept from the regression. The z-score of the spread is then calculated using the mean and standard deviation from the training set. A position is entered when the z-score reaches +2 or -2 and is exited when the z-score reaches 0. Also exits losing positions when the z-score reaches +4 or -4 and re-enters the position when when it returns to within the +4 or -4 range.
Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
coin_y: A string indicating the dependent coin in the coin pair regression.
coin_x: A string indicating the independent coin in the coin pair regression.
threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.
generate_signals <- function(train, test, coin_y, coin_x, threshold_z) {
model <- lm(log(train[[coin_y]]) ~ log(train[[coin_x]]))
intercept <- coef(model)[1]
hedge_ratio <- coef(model)[2]
df_signals <- test %>%
mutate(spread = log(test[[coin_y]]) - log(test[[coin_x]]) * hedge_ratio - intercept,
spread_z = (spread - mean(model[["residuals"]])) / sd(model[["residuals"]]),
signal_long = ifelse(lag(spread_z, 1) <= -threshold_z, 1, NA),
signal_long = ifelse(lag(spread_z, 1) >= 0, 0, signal_long),
signal_long = ifelse(lag(spread_z, 1) <= -4, 0, signal_long),
signal_long = na.locf(signal_long, na.rm = FALSE),
signal_short = ifelse(lag(spread_z, 1) >= threshold_z, -1, NA),
signal_short = ifelse(lag(spread_z, 1) <= 0, 0, signal_short),
signal_short = ifelse(lag(spread_z, 1) >= 4, 0, signal_short),
signal_short = na.locf(signal_short, na.rm = FALSE),
signal = signal_long + signal_short,
signal = ifelse(is.na(signal), 0, signal))
return(df_signals[["signal"]])
} Description
Calculate the return of a cointegration-based mean reversion trading strategy using coin y and coin x.
The current backtesting logic uses signals generated by generate_signals(). The coin_y_return and coin_x_return indicate the one period percentage return of each coin. The coin_y_position and coin_x_position indicate the market value in USD in each coin. coin_y_pnl and coin_x_pnl indicate the USD value of the profit and loss for each coin. The combined_position indicates the gross market value of the combined positions.
Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
coin_y: A string indicating the dependent coin in the coin pair regression.
coin_x: A string indicating the independent coin in the coin pair regression.
threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.
backtest_pair <- function(train, test, coin_y, coin_x, threshold_z) {
model <- lm(log(train[[coin_y]]) ~ log(train[[coin_x]]))
intercept <- coef(model)[1]
hedge_ratio <- coef(model)[2]
df_backtest <- test %>%
mutate(signal = generate_signals(train = train,
test = test,
coin_y = coin_y,
coin_x = coin_x,
threshold_z = threshold_z),
coin_y_return = test[[coin_y]] / lag(test[[coin_y]], 1) - 1,
coin_x_return = test[[coin_x]] / lag(test[[coin_x]], 1) - 1,
coin_y_position = signal * 1,
coin_x_position = signal * hedge_ratio * -1,
coin_y_pnl = lag(coin_y_position, 1) * coin_y_return,
coin_x_pnl = lag(coin_x_position, 1) * coin_x_return,
combined_position = abs(coin_y_position) + abs(coin_x_position),
combined_pnl = coin_y_pnl + coin_x_pnl,
combined_return = combined_pnl / lag(combined_position, 1)) %>%
mutate_all(funs(ifelse(is.na(.), 0, .))) %>%
mutate(return_pair = cumprod(1 + combined_return))
return(df_backtest[["return_pair"]])
} Description
Calculate the return of a cointegration-based mean reversion trading strategy using an equally weighted portfolio of cointegrated coin pairs.
Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
selected_pairs: A dataframe generated by select_coins() that represents a set of cointegrated coin pairs.
backtest_strategy <- function(train, test, selected_pairs, threshold_z) {
df <- tibble()
for (i in 1:nrow(selected_pairs)) {
single_pair <- tibble(
return_pair = backtest_pair(train = train,
test = test,
coin_y = selected_pairs[["coin_y"]][i],
coin_x = selected_pairs[["coin_x"]][i],
threshold_z = threshold_z),
coin_y = selected_pairs[["coin_y"]][i],
coin_x = selected_pairs[["coin_x"]][i],
date_time = test[["date_time"]]
)
df <- bind_rows(df, single_pair)
}
df <- df %>%
group_by(date_time) %>%
summarise(return_strategy = mean(return_pair))
return(df[["return_strategy"]])
} Description
Create plots of a cointegration-based mean reversion trading strategy of a single coin pair conprised of coin y and coin x. There are two plots created by this function. The first plot displays the spread transformed into z-score with three red lines at -2, 0, and 2. A green line indicates the signal which can take values -1, 0, and +1. The second plot displays the cumulative return of the model in blue. Two additional lines show the buy and hold return of coin y and coin x as red and green lines, respectively.
Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
coin_y: A string indicating the dependent coin in the coin pair regression.
coin_x: A string indicating the independent coin in the coin pair regression.
threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.
plot_single <- function(train, test, coin_y, coin_x, threshold_z) {
model <- lm(log(train[[coin_y]]) ~ log(train[[coin_x]]))
intercept <- coef(model)[1]
hedge_ratio <- coef(model)[2]
df_plot <- test %>%
mutate(spread = log(test[[coin_y]]) - log(test[[coin_x]]) * hedge_ratio - intercept,
spread_z = (spread - mean(model[["residuals"]])) / sd(model[["residuals"]]),
signal = generate_signals(train = train,
test = test,
coin_y = coin_y,
coin_x = coin_x,
threshold_z = threshold_z),
return_pair = backtest_pair(train = train,
test = test,
coin_y = coin_y,
coin_x = coin_x,
threshold_z = threshold_z),
return_buyhold_y = test[[coin_y]] / test[[coin_y]][1],
return_buyhold_x = test[[coin_x]] / test[[coin_x]][1])
print(ggplot(df_plot, aes(x = date_time)) +
geom_line(aes(y = spread_z, colour = "Spread Z"), size = 1) +
geom_line(aes(y = signal, colour = "Signal"), size = 0.5) +
geom_hline(yintercept = 0, colour = "red", alpha = 0.5) +
geom_hline(yintercept = 2, colour = "red", alpha = 0.5) +
geom_hline(yintercept = -2, colour = "red", alpha = 0.5) +
scale_color_manual(name = "Series",
values = c("Spread Z" = "blue",
"Signal" = "green")) +
labs(title = "Spread vs Trading Signal",
subtitle = str_c(coin_y, " and ", coin_x),
x = "Date",
y = "Spread and Signal"))
print(ggplot(df_plot, aes(x = date_time)) +
geom_line(aes(y = return_pair, colour = "Model"), size = 1) +
geom_line(aes(y = return_buyhold_y, colour = "Coin Y"), size = 0.5, alpha = 0.4) +
geom_line(aes(y = return_buyhold_x, colour = "Coin X"), size = 0.5, alpha = 0.4) +
geom_hline(yintercept = 1, colour = "black") +
scale_color_manual(name = "Return",
values = c("Model" = "darkblue",
"Coin Y" = "darkred",
"Coin X" = "darkgreen")) +
labs(title = "Model Return vs Buy Hold Return",
subtitle = str_c(coin_y, " and ", coin_x),
x = "Date",
y = "Cumulative Return"))
} Description
Create many plots by calling the plot_single() function multiple times. Also creates a plot showing the results of the overall strategy. Creates a train and test set surrounding a cutoff date and creates plot for the top 10 selected coins ranked by their ADF statistic.
Arguments
pricing_data: A dataframe containing pricing data from Poloneix gathered in tidy format.
time_resolution: The number of seconds that each observation spans. Takes values 300, 900, 1800, 7200, 14400, and 86400.
cutoff_date: A data representing the cutoff date between the train and test sets.
train_window: A period object from the lubridate package representing the length of time the train set covers.
test_window: A period object from lubridate package representing the length of time the the test set covers. threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.
plot_many <- function(pricing_data, time_resolution, cutoff_date, train_window, test_window, threshold_z) {
train <- prepare_data(pricing_data = pricing_data,
time_resolution = time_resolution,
start_date = as.Date(cutoff_date) - train_window,
end_date = as.Date(cutoff_date))
test <- prepare_data(pricing_data = pricing_data,
time_resolution = time_resolution,
start_date = as.Date(cutoff_date),
end_date = as.Date(cutoff_date) + test_window)
selected_pairs <- select_pairs(train = train, coin_pairs = create_pairs())
test <- test %>%
mutate(return_strategy = backtest_strategy(train = train,
test = .,
selected_pairs = selected_pairs,
threshold_z = threshold_z))
print(selected_pairs)
for (i in 1:10) {
plot_single(train = train,
test = test,
coin_y = selected_pairs[["coin_y"]][i],
coin_x = selected_pairs[["coin_x"]][i],
threshold_z = threshold_z)
}
ggplot(test, aes(x = date_time)) +
geom_line(aes(y = return_strategy, colour = "Strategy"), size = 1) +
geom_line(aes(y = USDT_BTC / USDT_BTC[1], colour = "USDT_BTC"), size = 0.5, alpha = 0.4) +
geom_hline(yintercept = 1, colour = "black") +
scale_color_manual(name = "Return",
values = c("Strategy" = "darkblue",
"USDT_BTC" = "darkred")) +
labs(title = "Strategy Return vs Buy Hold Return",
x = "Date",
y = "Cumulative Return")
} time_resolution <- 300
train_window <- days(16)
test_window <- days(8)
test_by <- "8 days"
threshold_z <- 2 plot_many(pricing_data = pricing_data,
time_resolution = time_resolution,
cutoff_date = "2017-09-01",
train_window = train_window,
test_window = test_window,
threshold_z = threshold_z) ## # A tibble: 16 x 3
## coin_y coin_x adf_stat
## <chr> <chr> <dbl>
## 1 USDT_DASH USDT_ZEC -4.963572
## 2 USDT_ZEC USDT_DASH -4.824437
## 3 BTC_DASH BTC_ZEC -4.619483
## 4 BTC_ZEC BTC_DASH -4.586119
## 5 BTC_XEM BTC_ZEC -4.196954
## 6 BTC_XEM BTC_LTC -4.134311
## 7 BTC_XEM BTC_DASH -3.992983
## 8 BTC_XEM BTC_ETH -3.942901
## 9 USDT_REP USDT_ZEC -3.830189
## 10 USDT_ZEC USDT_REP -3.700380
## 11 BTC_XEM BTC_XMR -3.571197
## 12 USDT_DASH USDT_REP -3.569527
## 13 USDT_ETH USDT_LTC -3.507121
## 14 USDT_REP USDT_XMR -3.506964
## 15 USDT_REP USDT_DASH -3.495590
## 16 BTC_ZEC BTC_REP -3.443055
plot_many(pricing_data = pricing_data,
time_resolution = time_resolution,
cutoff_date = "2017-08-01",
train_window = train_window,
test_window = test_window,
threshold_z = threshold_z) ## # A tibble: 22 x 3
## coin_y coin_x adf_stat
## <chr> <chr> <dbl>
## 1 USDT_REP USDT_ZEC -6.661288
## 2 USDT_ZEC USDT_REP -6.582965
## 3 USDT_LTC USDT_REP -6.206508
## 4 USDT_REP USDT_LTC -6.087571
## 5 BTC_REP BTC_ZEC -5.652637
## 6 BTC_ZEC BTC_REP -5.358099
## 7 BTC_ETH BTC_ZEC -4.971175
## 8 USDT_ETH USDT_ZEC -4.943033
## 9 USDT_ZEC USDT_ETH -4.931971
## 10 USDT_LTC USDT_ZEC -4.791980
## # ... with 12 more rows
plot_many(pricing_data = pricing_data,
time_resolution = time_resolution,
cutoff_date = "2017-07-01",
train_window = train_window,
test_window = test_window,
threshold_z = threshold_z) ## # A tibble: 45 x 3
## coin_y coin_x adf_stat
## <chr> <chr> <dbl>
## 1 USDT_REP USDT_XMR -7.664187
## 2 USDT_XMR USDT_REP -7.472395
## 3 BTC_XEM BTC_ZEC -6.626656
## 4 BTC_ZEC BTC_XEM -6.546489
## 5 USDT_REP USDT_ZEC -5.654900
## 6 USDT_REP USDT_ETH -5.475040
## 7 BTC_XEM BTC_XMR -5.374262
## 8 BTC_XMR BTC_XEM -5.361753
## 9 BTC_XMR BTC_ETH -5.358647
## 10 BTC_ETH BTC_XMR -5.315250
## # ... with 35 more rows
plot_many(pricing_data = pricing_data,
time_resolution = time_resolution,
cutoff_date = "2017-06-01",
train_window = train_window,
test_window = test_window,
threshold_z = threshold_z) ## # A tibble: 30 x 3
## coin_y coin_x adf_stat
## <chr> <chr> <dbl>
## 1 USDT_REP USDT_XMR -6.451670
## 2 USDT_XMR USDT_REP -6.368368
## 3 USDT_REP USDT_DASH -6.254209
## 4 USDT_XMR USDT_DASH -6.128140
## 5 USDT_DASH USDT_REP -6.123690
## 6 USDT_DASH USDT_XMR -6.080574
## 7 USDT_BTC USDT_XMR -4.972551
## 8 BTC_XMR BTC_REP -4.954575
## 9 USDT_BTC USDT_REP -4.902192
## 10 BTC_REP BTC_XMR -4.863485
## # ... with 20 more rows
plot_many(pricing_data = pricing_data,
time_resolution = time_resolution,
cutoff_date = "2017-05-01",
train_window = train_window,
test_window = test_window,
threshold_z = threshold_z) ## # A tibble: 16 x 3
## coin_y coin_x adf_stat
## <chr> <chr> <dbl>
## 1 USDT_XMR USDT_DASH -5.339833
## 2 USDT_DASH USDT_ZEC -5.308115
## 3 USDT_DASH USDT_XMR -5.282598
## 4 USDT_XMR USDT_ZEC -5.234008
## 5 USDT_ZEC USDT_DASH -5.214210
## 6 USDT_ZEC USDT_XMR -5.082341
## 7 USDT_XMR USDT_ETH -4.583268
## 8 BTC_DASH BTC_ZEC -4.387828
## 9 BTC_ZEC BTC_DASH -4.239062
## 10 USDT_ZEC USDT_ETH -4.228714
## 11 USDT_ETH USDT_XMR -4.216820
## 12 USDT_ETH USDT_ZEC -4.024842
## 13 USDT_REP USDT_ZEC -3.751986
## 14 USDT_REP USDT_ETH -3.606797
## 15 USDT_REP USDT_DASH -3.559779
## 16 USDT_ZEC USDT_REP -3.448683
plot_many(pricing_data = pricing_data,
time_resolution = time_resolution,
cutoff_date = "2017-04-01",
train_window = train_window,
test_window = test_window,
threshold_z = threshold_z) ## # A tibble: 32 x 3
## coin_y coin_x adf_stat
## <chr> <chr> <dbl>
## 1 BTC_XMR BTC_DASH -6.064993
## 2 USDT_REP USDT_DASH -5.516708
## 3 USDT_DASH USDT_REP -5.202461
## 4 USDT_XMR USDT_ETH -4.936369
## 5 BTC_XMR BTC_ZEC -4.863836
## 6 USDT_XMR USDT_DASH -4.637342
## 7 BTC_DASH BTC_XMR -4.588895
## 8 USDT_REP USDT_LTC -4.461010
## 9 BTC_XMR BTC_LTC -4.442602
## 10 BTC_XMR BTC_REP -4.391951
## # ... with 22 more rows
plot_many(pricing_data = pricing_data,
time_resolution = time_resolution,
cutoff_date = "2017-03-01",
train_window = train_window,
test_window = test_window,
threshold_z = threshold_z) ## # A tibble: 15 x 3
## coin_y coin_x adf_stat
## <chr> <chr> <dbl>
## 1 USDT_LTC USDT_BTC -5.968538
## 2 USDT_LTC USDT_DASH -5.490427
## 3 USDT_LTC USDT_ZEC -5.365049
## 4 USDT_LTC USDT_REP -5.348822
## 5 USDT_LTC USDT_ETH -5.235835
## 6 USDT_LTC USDT_XMR -5.220467
## 7 BTC_XMR BTC_LTC -4.334343
## 8 BTC_XEM BTC_XMR -4.327788
## 9 BTC_LTC BTC_XMR -4.271802
## 10 BTC_XEM BTC_LTC -4.149340
## 11 BTC_XMR BTC_XEM -4.083924
## 12 BTC_REP BTC_XMR -3.933374
## 13 BTC_XMR BTC_REP -3.891252
## 14 BTC_LTC BTC_XEM -3.854678
## 15 USDT_XMR USDT_BTC -3.490824
plot_many(pricing_data = pricing_data,
time_resolution = time_resolution,
cutoff_date = "2017-02-01",
train_window = train_window,
test_window = test_window,
threshold_z = threshold_z) ## # A tibble: 36 x 3
## coin_y coin_x adf_stat
## <chr> <chr> <dbl>
## 1 BTC_ETH BTC_REP -5.642071
## 2 BTC_REP BTC_ETH -5.523970
## 3 BTC_REP BTC_XEM -5.523347
## 4 USDT_DASH USDT_XMR -4.889190
## 5 BTC_REP BTC_ZEC -4.888025
## 6 USDT_ETH USDT_BTC -4.833015
## 7 BTC_XEM BTC_REP -4.816575
## 8 USDT_XMR USDT_DASH -4.811134
## 9 USDT_XMR USDT_BTC -4.588548
## 10 USDT_BTC USDT_ETH -4.579627
## # ... with 26 more rows
plot_many(pricing_data = pricing_data,
time_resolution = time_resolution,
cutoff_date = "2017-01-01",
train_window = train_window,
test_window = test_window,
threshold_z = threshold_z) ## # A tibble: 19 x 3
## coin_y coin_x adf_stat
## <chr> <chr> <dbl>
## 1 BTC_XEM BTC_DASH -4.971811
## 2 BTC_DASH BTC_XEM -4.903514
## 3 BTC_XEM BTC_ETH -4.652359
## 4 BTC_ETH BTC_XEM -4.504809
## 5 USDT_DASH USDT_REP -4.303517
## 6 USDT_ZEC USDT_BTC -4.182743
## 7 USDT_REP USDT_DASH -4.002864
## 8 BTC_DASH BTC_ETH -4.002584
## 9 BTC_ETH BTC_DASH -3.911101
## 10 USDT_BTC USDT_ZEC -3.782933
## 11 BTC_ZEC BTC_REP -3.747353
## 12 USDT_ETH USDT_DASH -3.746247
## 13 BTC_ZEC BTC_XEM -3.732848
## 14 USDT_ZEC USDT_LTC -3.676975
## 15 BTC_ZEC BTC_DASH -3.652745
## 16 USDT_REP USDT_XMR -3.606307
## 17 BTC_REP BTC_ZEC -3.533191
## 18 BTC_ZEC BTC_ETH -3.506434
## 19 USDT_DASH USDT_ETH -3.481783
cutoff_dates <- seq(ymd("2017-01-01"), ymd("2017-10-01"), by = test_by)
results <- tibble()
for (cutoff_date in cutoff_dates) {
cutoff_date <- as.Date(cutoff_date)
print(str_c("Cross validating strategy."))
print(str_c("Using train set from ", cutoff_date - train_window , " to ", cutoff_date, "."))
print(str_c("Using test set from ", cutoff_date, " to ", cutoff_date + test_window, "."))
train <- prepare_data(pricing_data = pricing_data,
time_resolution = time_resolution,
start_date = cutoff_date - train_window,
end_date = cutoff_date)
test <- prepare_data(pricing_data = pricing_data,
time_resolution = time_resolution,
start_date = cutoff_date,
end_date = cutoff_date + test_window)
test <- test %>%
mutate(return_strategy = backtest_strategy(train = train,
test = test,
selected_pairs = select_pairs(train = train, coin_pairs = create_pairs()),
threshold_z = threshold_z),
return_strategy_change = return_strategy / lag(return_strategy, 1) - 1) %>%
mutate_all(funs(ifelse(is.na(.), 0, .)))
results <- bind_rows(results, test)
} ## [1] "Cross validating strategy."
## [1] "Using train set from 2016-12-16 to 2017-01-01."
## [1] "Using test set from 2017-01-01 to 2017-01-09."
## [1] "Cross validating strategy."
## [1] "Using train set from 2016-12-24 to 2017-01-09."
## [1] "Using test set from 2017-01-09 to 2017-01-17."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-01-01 to 2017-01-17."
## [1] "Using test set from 2017-01-17 to 2017-01-25."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-01-09 to 2017-01-25."
## [1] "Using test set from 2017-01-25 to 2017-02-02."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-01-17 to 2017-02-02."
## [1] "Using test set from 2017-02-02 to 2017-02-10."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-01-25 to 2017-02-10."
## [1] "Using test set from 2017-02-10 to 2017-02-18."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-02-02 to 2017-02-18."
## [1] "Using test set from 2017-02-18 to 2017-02-26."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-02-10 to 2017-02-26."
## [1] "Using test set from 2017-02-26 to 2017-03-06."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-02-18 to 2017-03-06."
## [1] "Using test set from 2017-03-06 to 2017-03-14."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-02-26 to 2017-03-14."
## [1] "Using test set from 2017-03-14 to 2017-03-22."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-03-06 to 2017-03-22."
## [1] "Using test set from 2017-03-22 to 2017-03-30."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-03-14 to 2017-03-30."
## [1] "Using test set from 2017-03-30 to 2017-04-07."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-03-22 to 2017-04-07."
## [1] "Using test set from 2017-04-07 to 2017-04-15."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-03-30 to 2017-04-15."
## [1] "Using test set from 2017-04-15 to 2017-04-23."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-04-07 to 2017-04-23."
## [1] "Using test set from 2017-04-23 to 2017-05-01."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-04-15 to 2017-05-01."
## [1] "Using test set from 2017-05-01 to 2017-05-09."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-04-23 to 2017-05-09."
## [1] "Using test set from 2017-05-09 to 2017-05-17."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-05-01 to 2017-05-17."
## [1] "Using test set from 2017-05-17 to 2017-05-25."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-05-09 to 2017-05-25."
## [1] "Using test set from 2017-05-25 to 2017-06-02."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-05-17 to 2017-06-02."
## [1] "Using test set from 2017-06-02 to 2017-06-10."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-05-25 to 2017-06-10."
## [1] "Using test set from 2017-06-10 to 2017-06-18."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-06-02 to 2017-06-18."
## [1] "Using test set from 2017-06-18 to 2017-06-26."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-06-10 to 2017-06-26."
## [1] "Using test set from 2017-06-26 to 2017-07-04."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-06-18 to 2017-07-04."
## [1] "Using test set from 2017-07-04 to 2017-07-12."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-06-26 to 2017-07-12."
## [1] "Using test set from 2017-07-12 to 2017-07-20."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-07-04 to 2017-07-20."
## [1] "Using test set from 2017-07-20 to 2017-07-28."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-07-12 to 2017-07-28."
## [1] "Using test set from 2017-07-28 to 2017-08-05."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-07-20 to 2017-08-05."
## [1] "Using test set from 2017-08-05 to 2017-08-13."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-07-28 to 2017-08-13."
## [1] "Using test set from 2017-08-13 to 2017-08-21."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-08-05 to 2017-08-21."
## [1] "Using test set from 2017-08-21 to 2017-08-29."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-08-13 to 2017-08-29."
## [1] "Using test set from 2017-08-29 to 2017-09-06."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-08-21 to 2017-09-06."
## [1] "Using test set from 2017-09-06 to 2017-09-14."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-08-29 to 2017-09-14."
## [1] "Using test set from 2017-09-14 to 2017-09-22."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-09-06 to 2017-09-22."
## [1] "Using test set from 2017-09-22 to 2017-09-30."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-09-14 to 2017-09-30."
## [1] "Using test set from 2017-09-30 to 2017-10-08."
results <- results %>%
mutate(return_strategy_cumulative = cumprod(1 + return_strategy_change),
date_time = as.POSIXct(date_time, origin = "1970-01-01"))
ggplot(results, aes(x = date_time)) +
geom_line(aes(y = return_strategy_cumulative), colour = "blue", size = 1) +
geom_hline(yintercept = 1, colour = "black") +
labs(title = "Strategy Return vs Buy Hold Return", x = "Date", y = "Cumulative Return") print(results[["return_strategy_cumulative"]][nrow(results)]) ## [1] 0.5997211